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Abstract 

We study the problem of generating refer- 
ring expressions modulo different notions 
of expressive power. We define the no- 
tion of /^-referring expression, for a formal 
language L equipped with a semantics in 
terms of relational models. We show that 
the approach is independent of the partic- 
ular algorithm used to generate the refer- 
ring expression by providing examples us- 
ing the frameworks of (Areces et al., 2008) 
and (Krahmer et al., 2003). We provide 
some new complexity bounds, discuss the 
issue of the length of the generated de- 
scriptions, and propose ways in which the 
two approaches can be combined. 

1 Generating referring expressions 

The generation of referring expressions (GRE) - 
given a context and an element in that context 
generate a grammatically correct expression in a 
given natural language that uniquely represents the 
element- is a basic task in natural language gener- 
ation, and one of active research (see (Dale, 1989; 
Dale and Haddock, 1991; Dale and Reiter, 1995; 
Stone, 2000; van Deemter, 2002) among others). 
Most of the work in this area is focused on the 
content determination problem (i.e., finding a col- 
lection of properties that singles out the target ob- 
ject from the remaining objects in the context) 
and leaves the actual realization (i.e., expressing 
a given content as a grammatically correct expres- 
sion) to standard techniques^. 

However, there is yet no general agreement on 
the basic representation of both the input and the 
output to the problem; this is handled in a rather 
ad-hoc way by each new proposal instead. 

'For exceptions to this practice see, e.g, (Horacelc, 1997; 
Stone and Webber, 1998) 



(&ahmer et al., 2003) make the case for the use 
of labeled directed graphs in the context of this 
problem: graphs are abstract enough to express 
a large number of domains and there are many 
attractive and well-known algorithms for dealing 
with this type of structures. Indeed, labeled di- 
rected graphs are nothing other than an alternative 
representation of relational models, used to pro- 
vide semantics for formal languages like first and 
higher-order logics, modal logics, etc. Even val- 
uations, the basic models of propositional logic, 
can be seen as one point labeled graphs. It is not 
surprising then that they are well suited to the task. 

In this article, we side with (Krahmer et al., 
2003) and use labeled graphs as input, but we 
argue that an important notion has been left out 
when making this decision. Exactly because of 
their generality graphs do not define, by them- 
selves, a unique notion of sameness. When do 
we say that two nodes in the graphs can or can- 
not be referred uniquely in terms of their prop- 
erties? This question only makes sense once we 
fix a certain level of expressiveness which deter- 
mines when two graphs, or two elements in the 
same graph, are equivalent. 

Investigating the GRE problem in terms of dif- 
ferent notions of expressiveness is the main goal 
of this paper. In §2, we will show alternative 
(but equivalent) ways in which different degrees 
of expressiveness can be defined, and discuss how 
choosing the adequate expressiveness has an im- 
pact on the number of instances of the GRE prob- 
lem that have a solution (less expressive logics can 
distinguish fewer instances); the computational 
complexity of the GRE algorithms involved; and 
the complexity of the surface realization problem. 

We maintain that this perspective is independent 
of the particular GRE algorithm being used. Our 
work fits naturally with the approach of (Areces 
et al., 2008) as we show in §3, where we also an- 
swer an open question concerning the complexity 
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Figure 1: Graph representation of scene S. 

of the GRE problem for the language SC. In §4 
we turn to the subgraph based algorithm of (Krah- 
mer et al, 2003), and show how to generalize it 
for other notions of sameness. In section §5 we 
show how one can combine the approaches of the 
previous two sections and in section §6 we discuss 
the size of the referring expressions relative to the 
expressiveness employed. 

2 Measuring expressive power 

Relational structures are notably appropriate for 
representing situations or scenes. A relational 
structure (also called "relational model") is a non- 
empty set of objects -the domain- together with a 
collection of relations, each with a fixed arity. 

Formally, assume a fixed and finite (but other- 
wise arbitrary) vocabulary of n-ary relation sym- 
bols^. Define a relational model as a tuple 
(A, II • II) where A is a nonempty set, and || • || is a 
suitable interpretation function, that is, ||r|| C A** 
for every n-ary relation symbol r. We say that M 
is finite whenever A is finite. The size of a model 
A4 is the sum #A + #|| • ||, where #A is the car- 
dinality of A and # || • || is the sum of the sizes of 
all relations in || • || . 

Figure 1 shows the representation of a scene 
with the relational model <S = (A, || • ||): 

A = {aj),c.d,e} 
\\dog\\ ={a,b,d} \\cat\\ ={c, e} 
II feea^/ef = {d} \\smaU\\ = {b,c,d} 

\\sniffs\\ = {{a, a), (6, a), (c, b), {d, e), (e, d)} 

Intuitively, a, b and d are dogs, while c and e 
are cats; d is a small beagle; b and c are also small. 
We read snijfs{d, e) as "(i is sniffing e". 

Logical languages are fitting for the task of (for- 
mally) describing elements of a relational struc- 
ture. Consider, e.g., the classical language of first- 
order logic (with equality), TC, given by: 

T \ Xi tyi Xj \ r(x) I -17 I 7 A 7' I ^Xi-j 

where 7, 7' € J'C, r is an n-ary relation symbol 
and X is an n-tuple of variables. As usual, 7 V 7' 



and Va;.7 are short for -'(-■7 A -17') and -■EIX.-17, 
respectively. Formulas of the form T, Xi 96 Xj and 
r(x) are called atoms? Given a relational model 
M. = (A, II • II ) and a formula 7 with free variables^ 
among xi . . . Xn, we inductively define the exten- 
sion or interpretation of 7 as the set of n-tuples 
II7II" C A" that satisfy: 

||T||" = A'' 
\\xi 96 Xj l" = {a\ae A", 7^ aj} 
\\r{xi, ...Xi,)r = {a\ae A", (a^, . . . jG||r||} 

ih(5r=A"\ ||5||" 

||(5A0||'^ = ||5||"n ll^ll" 

\\3xi.Sr = {a\aan+i G ||<5'|r+i} 

where 1 < i,j,ii,...,ik < n, a = (ai . . . a„), 
aun+i = [ai . . . ttn+i) and 6' is obtained by re- 
placing all occurrences of xi in S by Xn+i- When 
the cardinality of the tuples involved is known 
from context we will just write ||7|| instead of ||7||". 

With a language syntax and semantics in place, 
we can now formally define the problem of C- 
GRE for a target set of elements (we slightly 
adapt the definition in (Areces et al., 2008)): 

£-GRE Problem 
Input: a model A4 — (A, || ■ ||) and a target 
set T C A. 

Output: a formula (p € £ such that \\(fi\\ = T, or 
_L if such a formula does not exists. 

In case the output is not _L, we say that tp is a 
^-referring expression (£-RE) for M. 

2.1 Choosing the appropriate language 

Given a model Ai, there will be an infinite number 
of formulas that uniquely describe a target (e.g., if 
if describes a target T, then cp A cp trivially de- 
scribes T as well; even formulas which are not 
logically equivalent might have the same interpre- 
tation once the model is fixed). Despite having the 
same interpretation in Ai, they may be quite dif- 
ferent with respect to other parameters. 

To start with, and as it is well known in the au- 
tomated text generation community, different re- 
alizations of the same content might result in ex- 
pressions more or less appropriate. Although, as 



We do not consider constants or functions as they can be 
represented as relations of adequate arity. 



'Notice that we include the inequality ssmbol 96 as prim- 
itive. Equality can be defined using negation. 

"'W.l.o.g. we assume that no variable appears both free and 
bound, that no variable is bound twice, and that the index of 
bound variables in a formula increases from left to right. 

^Similarly we can use formulas with two free variables to 
describe binary relations, etc. Throughout this paper though, 
we only discuss the GRE for single elements. 



71 : dog{x) A small{x) A 3y.{sniffs{x,y) A dog{y)) 

72 : dog(x) A amall{x) A 'iy.{-^cat{y) V -^sniffs(x, y)) 

73 : do5(a;) A 3j/.(a; 56 y A do5(y) A sniffs{x, y)) 

74 : dog{x) A 3j/.(cat(j/) A small{y) A sniffs{y,x)) 

Table 1: Descriptions for 6 in Figure 1. 

we mentioned in the introduction, we will only ad- 
dress the content determination (and not the sur- 
face realization) part of the GRE problem, gen- 
erating content using languages with different ex- 
pressive power can have an impact in the posterior 
surface generation step. 

Let us consider again the scene in Figure 1 . For- 
mulas 71-74 shown in Table 1 are all such that 7^ 
uniquely describe b in model S. 

Arguably, 71 can be easily realized as "the 
small dog that sniffs a dog". Syntactically, 71 
is characterized as a positive, conjunctive, exis- 
tential formula (i.e., it contains no negation and 
uses only conjunction and existential quantifica- 
tion). Expressions with these characteristics are, 
by large, the most commonly found in corpora as 
those compiled in (Viethen and Dale, 2006; van 
Deemter et al., 2006; Dale and Viethen, 2009). 72 
on the other hand contains negation, disjunction 
and universal quantification and could be realized 
as "the small dog that only sniffs things that are 
not cats" which sounds very unnatural. Even a 
small change in the form of 72 turns it more palat- 
able: rewrite it using only 3, and A to obtain 
"the small dog that is not sniffing a cat" . Sim- 
ilarly, formulas 73 and 74 seem computationally 
harder to realize than 71: 73 because it contains 
an inequality ( "the dog sniffing another dog ") and 
74 because the quantified object appears in the first 
argument possition in the binary relation ( "the dog 
that is sniffed by a small cat"). 

Summing up, even without taking into account 
fundamental linguistics aspects that will make cer- 
tain realization preferable -e.g., saUency, the cog- 
nitive capacity of the hearer (can she recognize a 
beagle from another kind of dog?), etc.- we can 
ensure during content determination certain prop- 
erties of the generated referring expression. 

Concretely, let TC' be the fragment of J'C- 
formulas where the operator does not occur.^ By 
restricting content determination to !FC~, we en- 
sure that formulas Uke 72 will not be generated. 
If we also (or, alternatively) ban 96 from the lan- 
guage, 73 is precluded. And we need not restrict 

*But notice that atoms Xi 56 xj are permitted. 



ourselves to explicit fragments of first-order logic: 
many logical languages are known to be expres- 
sively equivalent to fragments of first-order logic. 
For example, the language of the description logic 
ACC (Baader et al., 2003), given by: 

"T I J* I ~'7 I 7 A 7' I 3r.7 

(where 7, 7' G ACC) corresponds to a syntactic 
fragment of J^C without 96, as shown by the stan- 
dard translation to first-order logic Tx'. 

T..(T) = T 

TxM =P{Xi) 

Txi(7i A 72) = Ta;,(7i) A Ta;,(72) 

'^xii^r.-f) = 3xi+i.{r{xi,Xi+i) ATxi+j^ij)) 

Hence, by restricting content generation to ACC 
we would avoid formulas like 73 (no equality) and 
74 (quantified element appears always in second 
argument position). 

(Areces et al., 2008) discuss generation in terms 
of different description logics like ACC and £C 
{ACC without negation). In this article, we will 
extend the results in that paper, considering for in- 
stance SC^ (ACC with negation allowed only in 
front of unary relations) but, more generally, we 
argue that the primary question is not whether one 
should use one or other (description) logic for con- 
tent generation, but rather which are the seman- 
tic differences one cares about. This determines 
the required logical formalism which, in turn, im- 
pacts both content determination and surface real- 
ization. 

We have mentioned several logic languages 
(and there are many more alternatives like al- 
lowing disjunctions, counting quantifiers, etc.). 
Each language can be seen as a compromise be- 
tween expressiveness, realizability and computa- 
tional complexity. Therefore, the appropriate se- 
lection for a particular GRE task should depend 
on the actual context. Moreover, as we will see, 
the move from one logical language to another im- 
pacts not only on the shape of formulas that can be 
generated but also on the computational complex- 
ity of the generation problem, and on its success, 
i.e., when it will be possible to uniquely identify a 
given target. 

2.2 Defining sameness 

For any given logical language C, we say that u is 
distinguishable (in C) from v whenever there ex- 



ists an >C-formula 7 such that u makes 7 true while 
V makes it false. Formally, let Mi = (Ai, || • 
and A^2 = {^2, \\ ■ II2) be two relational models 
with u € Ai and 7; G A2; we follow the terminol- 
ogy of (Areces et al., 2008) and say that "tt is C- 
similar to v" (notation u v) whenever u E \\'y\\i 
implies v G II7II2, for every 7 G C7 /^-similarity 
is reflexive for all C, and symmetric for languages 
that contain negation. 

Observe that ^-similarity captures the notion of 
'indistinguishability' (in £). One can take Mi 
and M2 to be the same model and in that case, 
if u V for u ^ V, the >C-content determina- 
tion problem for u will not succeed (since for ev- 
ery 7 G A IMIt^M). 

Fortunately, one need not consider infinitely 
many ^-formulas to decide whether u is i2-similar 
to V. We can reinterpret /^-similarity in terms 
of standard model-theoretic notions like isomor- 
phisms or bisimulations^ which describe structural 
properties of the model, instead. We will use the 
term C-simulation to refer to the suitable notion 
for C; in the case of the languages we are consid- 
ering they can be defined in a modular way. Given 
a relation ~C Ai x A2, it may or may not possess 
the properties we call atom x, REL^, jji, inj ^ /r 
(see below). Table 2 define various /^-simulations 
in terms of these. 

atoml: If ui~ii2, then ui G ||p||i =^ M2 G ||p||2- 
ATOMii: If iti~ti2, then U2 G \p\2 =^ ni G 
RELi: If «i'^'U2 and (ui, ui) G then ui~U2 

and (^2, ■U2) G ||p||2, for some V2. 
relh: If mi~«2 and (ii2, 1^2) £ ||p||2, then 

and (tti, vi) G ||p||i, for some v\. 
iNJz,: ~: Al —> A2 is an injective function. 
iNJij: A2 — >■ A2 is an injective function. 

The following is a fundamental model-theoretic 
result (Ebbinghaus et al., 1996; Kurtonina and de 
Rijke, 1999; Blackburn et al., 2001): 

Theorem 1. If Mi and M2 are finite models, u G 
Al and v G A2, then u v iffu ^v. 

The right to left imphcation does not hold in 
general on infinite models. Notice that ^ corre- 
sponds to relational model isomorphism while 
corresponds to the notion of bisimulation. 

'For 7 6 ACC and its sublanguages, ||7|| = \\Tx;^ 
^For the rest of the article, we will focus on relational 
models with only unary and binary relational symbols. These 
are the usual models of interest when describing scenes as 
the one presented in Figure 1. Accommodating relations of 
higher arity poses only notational problems. 
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Table 2: Simulations for various logics. 



/2-simulations allow us to determine, in an ef- 
fective way, when an object is indistinguishable 
from another in a given model with respect to C 

For example, it is easy to see that a ^ 6 in the 
model of Figure 1 (simply verify that the ~ such 
that a ~ 6 and x ~ x (for a; G A) satisfies ATOM l 
and RELl). Using Theorem 1 we conclude that, 
intuitively, no £^£-formula can distinguish "a dog 
sniffing itself" from "a dog sniffing (another) dog 
sniffing itself". Similarly, no TC formula will dis- 
tinguish two /"/^-similar (isomorphic) objects. 

There are well-known algorithms to compute 
certain kinds of /2-simulations (Hopcroft, 1971; 
Paige and Tarjan, 1987; Henzinger et al., 1995; 
Dovier et al., 2004). We will discuss this in more 
detail in the next section. While polynomial time 
algorithms for many languages like ACC, ACC 
with inverse relations, £C^ and <S/2-simulation, 
etc. can be obtained, no polynomial time algo- 
rithms for TC and ,F£~-simulation are known 
(actually, even their actual complexity class is not 
known (Garey and Johnson, 1979)). 

3 GRE via simulator sets 

Given a model = (A, || • ||), Theorem I tells us 
that if two distinct elements u and u in A are such 
that u V then every /2-formula that is true at 
u is also true at v. Hence there is no formula in C 
that can uniquely refer to u. From this perspective, 
knowing whether the model contains an element 
that is /^-similar but distinct from u is equivalent 
to that of whether there exists an C-RE for u. 

We begin by considering this task. Assume 
fixed a language C and a model M. Suppose we 
want to refer to an element u in the domain of M. 
We would like to compute the simulator set of u 
defined as simc{v) = {u \ v ^u}. If simc{v) is 
not the singleton {v}, we cannot £-refer to v. 

(Henzinger et al., 1995) propose an algorithm 
to compute the set sim^^+ (v) for each element v 
of a given finite model = (A, || • ||)^ in time 

'Actually the algorithm proposed in (Henzinger et al., 
1995) is over labeled graphs, but it can be adapted to com- 
pute sim££+ by appropriately labeUng the model. 



0(#A X #11 • II). Intuitively, this algorithm de- 
fines S{v) as the set of candidates for simulating 
V and successively refines it by removing those 
elements which do not simulate v. At the end, 
S{v) = sim£jr+ {v). 

The algorithm can be adapted to compute simc 
for other C. In particular, we can use it to compute 
simgc in polynomial time which will give us the 
basic algorithm for estabhshing an upper bound to 
the complexity of the 5£-GRE problem (this will 
answer an open question of (Areces et al., 2008)). 

Let us first introduce some notation. We fix V 
as the set of all unary relations of the language £C. 
For G A let P{v) = {p \ v e \p\} and let 
sucr{v) = {n G A I (v, n) G ||r||} for any binary 
relation r present in the language. 

The pseudo-code is shown in Algorithm 1 . We 
initialize S{v) with the set of all elements u ^ A 
such that P{v) C P{u), i.e., the set of all elements 
satisfying at least the same unary relations as v 
(this guarantees that ATOM ^ holds). At each step, 
if there are three elements u, v and w such that for 
some relation r, {u,v) G ||r||, w G S{u) (i.e., w 
is a candidate to simulate u) but sucr{w) n S{v) 
(there is no element u/ such that {w,w') G ||r|| 
and w' G S{v)) then clearly condition REL^ is not 
satisfied under the supposition that simsc = S. S 
is 'too big' because w cannot simulate u. Hence 
w is removed from S{u). 

Algorithm 1: Computing f /I-similarity 

input : a finite model A4 = (A, || • ||) 

output: Vv 6 A, tlie simulator set simsciv) = S{v) 

foreach v i= Ado 

L S{v) := {m e A I P{v) C P(w)} 

while 3r,w,v,w : {u,v) € € S{u), 

sucr{w) n S{v) = 0do 
L S{u) := S(u) \ {w} 



Of course. Algorithm I will only tell us whether 
a referring expression for an element v exists (that 
is, whenever simsciv) = {v}). It does not com- 
pute an £'£-formula (p that uniquely refers to v. 
But Algorithm 1 is an instance of a family of well- 
known algorithms that compute /^-simulations by 
successively refining an over-approximation of the 
simulator sets. The "reason" behind each refine- 
ment can be encoded using an >C-formula; intu- 
itively, nodes that do not satisfy it are being re- 
moved from the simulator set on each refinement. 

Using this insight, one can transform an algo- 
rithm that computes ^-simulator sets into one that 
additionally computes an £-RE for each set. (Are- 



ces et al., 2008) used this approach to derive their 
AjOC-GRE method from a well-known algorithm 
for computing >l>CC-simulation (i.e., bisimulation) 
sets, but failed to notice they could derive one for 
££ analogously. 

Algorithm 2 shows a transformed version of Al- 
gorithm 1 following this principle. The idea is that 
each node u G A is now tagged with a formula 
F{v) of £C The formulas F{v) are updated along 
the execution of the loop, whose invariant ensures 
tiiat V G \\F{v)\\ and \\F{u)\\ C S{u) hold for all 
u,v E A. 

Algorithm 2: Computing f /3-similarity and EC-RE 

input : a finite model M = (A, 1 ■ [) 
output: F, the set of f/I-formulas, and S, the simulator 
sets s.t. (Vw G A) ||-F(t;)|| = S{v) = simec{v) 

foreach t; G A do 

S{v) {ti e A I P(v) C P(w)} 

_ F{v)■.= ^p{v) 

while 3r, u,v,w : (u,v) € \\r\\, w € S{u), 
sucriw) n S{v) = do 
invariant (\/u.v) \\F(u)\\ C S(u)Av e \\F(v)\\; 
S{u) ■- S{u) \ { w} 
if 3r.F{v) is not a conjunct of F{u) then 
L F{u) := F{u) A 3r.F{v) 



Initially F(y) is the conjunction of all the unary 
relations that satisfy v (if there is none, then 
F{v) = T). Next, each time the algorithm finds 
r,u,v,w such that {u,v) G ||r||, w G S{u) 
and sucr{w) n S{v) = 0, it updates F{u) to 
F{u) A 3r.F{v). Again this new formula tp is in 
and it can be shown that V G \ip\ wAw ^ \ip\, 
hence witnessing that v wis false. 

Algorithm 2 can be easily modified to calcu- 
late the £C^-RE of each simulator set simg^+ 
by adjusting the initialization: replace C by = 
in the initialization of S{v) and initialize F{v) as 
A {P{v) U P(f)), where = {^p \ v ^ ||p||}. 

With a naive implementation Algorithm 2 exe- 
cutes in time 0(#A^x #|| • p) providing a poly- 
nomial solution to the £C and £C~^-GRE prob- 
lems. (Henzinger et al., 1995) show a more in- 
volved version of Algorithm 1 with lower com- 
plexity, which can be adapted in a similar way to 
compute F{v). We shall skip the details. 

Theorem 2. The £jC/£jC+-GRE problems over 
M = (A, II • II) have complexity 0(#A x #|| • ||). 

Theorem 2 answers a question left open by 
(Areces et al., 2008): the £C-GRE problem can 
be solved in polynomial time. Note that the above 
result assumes a convenient representation of for- 
mulas as directed acyclic graphs (for 0(1) formula 



construction). In section 6 we will take a look at 
this in more detail. 

We have not addressed the issue of preferences 
with respect to the use of certain relations, and 
moreover, we have presented our algorithms as 
close as possible to the original proposal of (Hen- 
zinger et al., 1995) which makes them prioritize 
unary relations over binary relations. The latter 
can be avoided by representing unary relations as 
binary relations (cf. §4). Certain control on pref- 
erences can then be introduced by taking them 
in consideration instead of the non-deterministic 
choice of differentiating elements made in the 
main loop of the algorithm. But despite these 
modifications, the algorithms based on simulator 
sets seem to offer less room for implementing 
preferences than the ones we will discuss in § 4. 

4 GRE via building simulated models 

We now revisit the algorithm presented by (Krah- 
mer et al., 2003), identify its underlying notion 
of expressiveness, and extend it to accommodate 
other notions. For reasons of space we assume the 
reader is familiar with this algorithm and refer her 
to that article for further information. 

We must first note that scenes are encoded 
in that article in a slightly different way: there, 
graphs have only labels on edges, and non- 
relational attributes such as type or color are rep- 
resented by loops (e.g., sm,all{a,a)). While our 
presentation is, arguably, conceptually cleaner, it 
forces us to treat the atomic and relational cases 
separately. 

The second thing to note is that the output of 
their algorithm is a connected subgraph H of the 
input graph G that includes the target u among its 
nodes. This means that the algorithm does not fit 
in the definition of C-GRE we presented in §2. 

Now, H must be such that every subgraph iso- 
morphism^^ f between H and G satisfies f{u) = 
u. On relational models, subgraph isomorphism 
corresponds to J'/Z^ -simulations, which makes 
explicit the notion of expressiveness that was used. 
Indeed, from the output H of this algorithm, one 
can easily build a j'^^^-formula that univoquely 
describes the target u, as is shown in Algorithm 3. 
Observe that if J^£-simulations were used instead, 
we would have to include also which unary and bi- 
nary relations do not hold in H. 

'"a subgraph isomophism between Gi and G2 is an iso- 
morphism between Gi and a subgraph of G2. 



Algorithm 3: buildFormulajr£- {H, v) 
II let = {{ai...an},|| • II), ^ = 01 
!■■= /\{xi ^ Xj) A /\r{xi,Xj) A /\p{xiy, 

return 3x2 ■ ■ ■ 3a;„.7; 

Having made explicit the notion of sameness 
underlying the algorithm of (Krahmer et al, 2003) 
and, with it, the logical language associated to it, 
we can proceed to generalize the algorithm, as 
shown in Algorithm 4. This algorithm is paramet- 
ric on £; to make it concrete, one needs to pro- 
vide appropriate versions of buildFormula^ 
and extend^. In order to make the discussion 
of the differences with the original algorithm sim- 
pler, we list the code for buildFormula^p-^- 
and extendjr£- in Algorithms 3 and 6. 

Algorithm 4: inakeRE£ (v) 
Vh '■= new node; 

{H,f) ■.= {{{vh},0,0),{vh^v}); 
H' :=findc {vH,±,H,f); 
return buildFormula£ (H',vh) 



Algorithm 5: f ind£ (vh, best, H, f) 

it best / -La cost{best) < cost(ii") then 
L return best 

distractors := {n | n € Ag A n ^ v A vh ^n}; 
if distractors = then 
L return H 

foreach {H' , /') G extendc(H, f) do 
/ := find/: {vh, best, H' , /'); 
if best = ± V cost(/) < cost{best) then 
L best := I 

return best 

Notice that makeRE/; computes not only a 
graph H but also an £-simulation /. In the case 
of TC~, H is a subgraph of G and, therefore, / 
is the trivial identity function id{x) = x. We will 
see the need for / when discussing the case of less 
expressive logics like SC. Observe also that in 
extendjr£- we follow the notation by (Krahmer 
et al, 2003) and write, for a model X = (A, || • ||), 
M + p{u) to denote the model (A U {u}, \\ ■ ||') 
such that IpII' = ||p|| U {u} and ||g||' = ||g|| when 
q ^ p. Similarly, A4 + r{u, v) denotes the model 
(A U {u, v}, II • II') such that ||r||' = ||r|| U {{u, v)} 
and = \\q\\ when q ^ r. It is clear, then, that 
this function is returning all the extensions of H 
by adding a missing attribute or relation to H, just 
like is done in the original algorithm. 

We now discuss a version of this algorithm for 
SC. The first thing to note is that one could, 
in principle, just use extendjr^- also for SC. 
Indeed, since f indf£ uses an f /^-simulation to 



Algorithm 6: extendjr£- (H, /) 

a := {H+p{u) I M e Ah, M € |b||G \ \\p\\h}\ 

b := {H+r{u, v)\ue Ah, (u, v) € Mg \ Mh}; 

return (a U &) x {id} 



compute the set of distractors (in the terminology 
of (Krahmer et al., 2003)), the output of this func- 
tion would be a subgraph H of G such that for 
every 5£-simulation ~, u ~ u iff u = u. The 
problem is this subgraph may contain cycles and, 
as was observed in §2, they cannot be tell apart us- 
ing SC. The upshot is that we might be unable to 
realize the outcome of such function. 

A well-known result establishes that every rela- 
tional model Ai is equivalent, with respect to £C- 
formulas^^, to the unraveling of M (cf. (Black- 
bum et al., 2001)). That is, any model and its 
unraveling satisfy exactly the same £C formulas. 
Morever, the unraveling of M is always a tree, and 
as we show in Algorithm 7, it is straightforward to 
extract a suitable £^£-formula from a tree. 

Algorithm?: buildFormula££(i/, v) 
requires _ff to be a tree 

7:= {3r.buildFormula££(-ff, It) | {v,u) € ||r||}; 
return (A 7) A (A„€||p|| P); 



Therefore, we need extend^/; to return all the 
possible extensions of H by either adding a new 
proposition or a new edge that is present in the 
unraveling of G but not in H. This is shown in 
Algorithm 8. 

Observe that the behavior of f ind££ is quite 
sensible to the cost function / employed. For in- 
stance, on cyclic models, an / that does not guar- 
antee the unraveling is explored in a breadth-first 
way may lead to non- termination (since f indg^ 
may loop exploring an infinite branch). 

It is also possible to use modal model- 
theoretical results to put a bounds check that 
avoids generating an unraveling of infinite depth 
when there is no possible referring expression, but 
we will not go into the details for reasons of space. 

Algorithm 8 : e x t e ii els c { 11 . f ) 

a:={{H+p{u)J)\ueAH,ue\\p\\G-\\p\\H}; 
6:=0; 

foreach u G Ac do 

foreach uh £ AH/{f{uh),u) £ \\r\\G do 
it\fv.{{uH,v) € Mh => f{v) + u) then 
n := new node; 
|_ b■.= b\^{{H + r{uH,n)J\ny^u\)Y 

return a U & 



"Actually, the result holds even for A££ formulas. 



As a final note on complexity, although the 
set of f £-distractors may be computed more effi- 
ciently than J^£~-distractors, we cannot conclude 
that f ind£-£ is more efficient than findjr£- in 
general: the model built in the first case may be 
exponentially larger (it is an unraveling, after all). 

5 Combining GRE methods 

An appealing feature of formulating the GRE 
problem modulo expressivity is that one can de- 
vise general strategies that combine £-GRE algo- 
rithms. We illustrate this with an example. 

The algorithms based on /I-simulator sets like 
the ones in §3 simultaneously compute referring 
expressions for every object in the domain, and do 
this for many logics in polynomial time. This is an 
interesting property when one anticipates the need 
of referring to a large number of elements. How- 
ever, this family of algorithms is not as flexible in 
terms of implementing preferences as those in §4. 

There is a simple way to obtain an algorithm 
that is a compromise between these two tech- 
niques. Let A\ and A2 be two procedures that 
solve the /2-GRE problem based on the techniques 
of §3 and §4, respectively. One can first compute 
an £-RE for every possible object using A\ and 
then (lazily) replace the calculated RE for u with 
A^iu) whenever the former does not conform to 
some predefined criterion. But one can do better. 

Since A\ computes, for a given M. = (A, || • ||), 
the set sim{u) for every li G A, one can build 
in polynomial time, using the output of Ai, the 
model Mc = {{[u] \ u G A}, || • \c), such that: 

[u] = {v \ u -^v and v 

V\C = {{[Ul] ■ ■ ■ [Un]) I {Ul... Un) G ||r||} 

M.C is known as the C-minimization of M. By a 
straightforward induction on 7 one can verify that 
(ui . . . Un) G II7II iff {[ui] . . . [un]) G ||7||£ and 
this implies that 7 is a £-RE for in iff it is a 
£-RE for [u] in Mc- 

If A4 has a large number of indistinguishable el- 
ements (using C), then Mc will be much smaller 
than M. Since the computational complexity of 
A2 depends on the size of M, for very large 
scenes, one should compute ^2([^]) instead. 

6 On the size of referring expressions 

The expressive power of a language C determines 
if there is an £-RE for an element u. But ob- 
serve that when u can be described in £, it may 



also influence the size of the shortest >C-RE. In- 
tuitively, with more expressive power we are able 
to 'see' more differences and therefore have more 
resources at hand to build a shorter formula. 

A natural question is, then, whether we can 
characterize the relative size of the >C-REs for a 
given C. That is, if we can give (tight) upper 
bounds for the size of the shortest £-REs for the 
elements of an arbitrary model A4, as a function 
of the size of M. 

For the case of one of the most expressive log- 
ics considered in this article, FC~ , the answer 
follows from algorithm makeREjr^- in §4. In- 
deed, if an J'£~-RE exists, it is computed by 
buildFormulajr£- from a model H that is not 
bigger than the input model. It is easy to see that 
this formula is linear in the size of H and, there- 
fore the size of any J^C'-RE is 0(#A + #|| • ||). 
It is not hard to see that this upper bound holds for 
J^/:-REs too (cf. §4 for details). 

Although buildFormula^-/; also returns a 
formula that is linear in the size of the tree-model 
H, H could be, in principle, exponentially larger 
than the input model. We can use this to give an 
exponential upper bound for the size of the short- 
est SC-RE, but is it tight? 

One is tempted to conclude from Theorem 2 
that the size of shortest SC-RE is 0(#A x #|| • ||), 
but there is a pitfall. Theorem 2 assumes that for- 
mulas are represented as a DAG and guarantees 
this DAG is polynomial in the size of the input 
model. One can easily reconstruct (the syntax tree 
of) the formula from the DAG, but this, in princi- 
ple, may lead to a exponential blow-up (the result 
will be a exponentially larger formula, but com- 
posed of only a polynomial number of different 
subformulas). 

As the following example shows, it is indeed 
possible to obtain an £^£-formula that is exponen- 
tially larger when expanding the DAG representa- 
tion generated by Algorithm 2. 

Example 3. Consider a language with only one 
binary relation r, and let A4 = (A, || • ||) where 
A = {1, 2, . . . ra} and G ||r|| iff i < j. Algo- 
rithm 2 initializes F{j) = T for all j G A. Sup- 
pose the following choices in the execution: For 
i = 1, ... n — 1, iterate n — i times picking v = 
w = n — i + 1 and successively u = n — i, . . .1. 
It can be shown that each time a formula F{j) is 
updated, it changes from ip to A 3r.ip and hence 
it doubles its size. Since is updated n — 1 



many times, the size of F(l) is greater than 2". 

The large £-RE of Example 3 is due to an un- 
fortunate (non-deterministic) choice of elements. 
Example 4 shows that another execution leads to a 
quadratic RE (but notice the shortest one is linear). 
Example 4. Suppose now that in the first n — 1 
iterations we successively choose v = w = n — i 
and n = w — 1 for i = . . . n — 2. It can be seen 
that for convenient choices, F(l) is of size O(n^). 

We are yet unable to answer whether the expo- 
nential bound for the size of the minimum SC- 
RE is tight. We conjecture no polynomial bound 
can be given, though. In any case, it seems clear 
that not only existence of RE but relative lengths 
should be taken into account when considering the 
trade-off between expressive powers. 

7 Conclusions 

There is some notion of expressiveness underlying 
the formulation of every GRE problem. This "ex- 
pressiveness" can be formally measured in terms 
of a logical language or, dually, a simulation re- 
lation between models. In this article we have 
discussed making the notion of expressiveness in- 
volved an expUcit parameter of the GRE problem, 
unlike usual practice. 

We have taken an abstract view, defining the "jC- 
GRE problem"; and though we considered various 
possible choices for £,, we did not argue for any of 
them. Instead, we tried to make explicit the trade- 
off involved in the selection of a particular C This, 
we believe, depends heavily on the given context. 

By making expressiveness explicit, we can 
transfer general knowledge and results from the 
well-developed field of computational logics. This 
was exemplified in §3 and §4 where we were able 
to turn known GRE algorithms into families of al- 
gorithms that may deal analogously with different 
logical languages. We also applied this in §5 to 
devise new heuristics. 

Arguably, an explicit notion of expressiveness 
also provides a cleaner interface, either between 
the content-determination and surface realization 
modules or between two collaborating content- 
determination modules. An instance of the latter 
was exhibited in §5. 

As a future line of research, one may want to 
avoid sticking to a fixed £ but instead favor an in- 
cremental approach in which features of a more 
expressive language Ci are used only when jOq is 
not enough to distinguish certain element. 
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